Systems and methods for business analytics management and modeling

ABSTRACT

The present invention relates to systems and methods for model generation. The model is generated by selecting indicators that are relevant to the model, determining a strength score for each of the indicators, ranking the indicators by their strength scores, and bucketizing the indicators. Different permutations of the indicators are then selected for modeling in parallel. The model results are compared, and the ‘best’ model (most historically accurate) is selected for display within a report.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application also is a continuation-in-part and claimspriority to a commonly-owned application entitled “Business PerformanceForecasting System and Method”, U.S. application Ser. No. 13/558,333,filed on Jul. 25, 2012, recently allowed, which is a non-provisional andclaims the benefit of U.S. Provisional Applications No. 61/512,405 filedJul. 28, 2011 and 61/511,527 filed Jul. 25, 2011.

The present application also is a continuation-in-part and claimspriority to a commonly-owned application entitled “Systems and Methodsfor Forecasting Based upon Time Series Data”, application Ser. No.15/154,697, filed on May 13, 2016, pending, which is a non-provisionaland claims the benefit of U.S. Provisional Applications No. 62/269,978filed Dec. 19, 2015 and 62/290,441 filed Feb. 2, 2016.

All of the above-referenced applications are incorporated herein intheir entirety by this reference.

BACKGROUND

The present invention relates to systems and methods for the managementand modeling of business analytics. Business analytics allows forimproved insight into the current and future state of industries. Thesemetrics are very useful to business decision makers, investors andoperations experts.

Many factors influence the success or failure of a business or otherorganization. Many of these factors include controllable variables, suchas product development, talent acquisition and retention, and securingbusiness deals. However, a significant amount of the variablesinfluencing a business' success are external to the organization. Theseexternal factors that influence an organization are typically entirelyout of control of the organization, and are often poorly understood oraccounted for during business planning. Generally, one of the mostdifficult variables for a business to account for is the general healthof a given business sector.

While these external factors are not necessarily able to be altered,being able to incorporate them into business planning allows a businessto better understand the impact on the business, and make strategicdecisions that take into account these external factors. This may resultin improved business performance, investing decisions, and operationalefficiency. However, it has traditionally been very difficult toproperly account for, or model, these external factors; let alonegenerate meaningful forecasts using many different factors in astatistically meaningful and user friendly way.

For example, many industry outlooks that current exist are merelyopinions of so-called “experts” that may identify one or two factorsthat impact the industry. While these expert forecasts of industryhealth have value, they provide a very limited, and often inaccurate,perspective into the industry. Further these forecasts are generallyprovided in a qualitative format, rather than as a quantitative measure.For example, the housing industry may be considered “healthy” if theprior year demand was strong and the number of housing starts is up.However, the degree of ‘health’ in the market versus a prior period isnot necessarily available or well defined.

As a result, current analytical methods are incomplete, notquantitative, time consuming and labor intensive processes that areinadequate for the today's competitive, complex and constantly evolvingbusiness landscape.

It is therefore apparent that an urgent need exists for organizationalsolutions that enable the management and modeling of business analytics.These systems and methods for modeling and managing business analyticsenables better organizational and investment functioning.

SUMMARY

To achieve the foregoing and in accordance with the present invention,systems and methods for modeling and management of business analyticsare provided. Such systems and methods enable business persons,investors, and industry strategists to better understand the presentstate of their industries, and more importantly, to have foresight intothe future state of their industry.

In some embodiments, a model is generated by a business analyticssystem. The model is generated by selecting indicators that are relevantto the model, determining a strength score for each of the indicators,ranking the indicators by their strength scores, and bucketizing theindicators. Different permutations of the indicators are then selectedfor modeling in parallel. The model results are compared, and the ‘best’model (most historically accurate) is selected for display within areport.

The indicators are determined to be relevant by tags associated withthem, which may be searched and selected for inclusion by the user. Thestrength score for an indicator uses a number of factors, including aR-square calculation, a procyclic calculation, a minor outlier count, amajor outlier count, a seasonality determination, data overlap,difference in last updated data number of periods, number of models theindicator is included in, total number of models, and frequency value ofthe indicator.

The indicators after strength scoring and ranking are divided into oneof four buckets. These include a macroeconomic bucket, a target industrybucket, a demand industry bucket and a miscellaneous bucket. Evennumbers are included into each bucket. The data sets in themacroeconomic bucket are all at a national level, but the datasets inthe target industry bucket, the demand industry bucket and themiscellaneous bucket are equal parts national level and local leveldatasets if appropriate. When a local level dataset is unavailable orinappropriate, a higher level dataset (such as country or regional) issubstituted. Bucket selection is based upon indicator tags and strengthscores.

After a model has been generated it may be backtested, edited, and havevarious factors such as accuracy measures generated for it. This type ofdata may all be incorporated into the report that is based upon reporttemplates.

Note that the various features of the present invention described abovemay be practiced alone or in combination. These and other features ofthe present invention will be described in more detail below in thedetailed description of the invention and in conjunction with thefollowing figures.

BRIEF DESCRIPTION OF THE DRAWINGS

In order that the present invention may be more clearly ascertained,some embodiments will now be described, by way of example, withreference to the accompanying drawings, in which:

FIG. 1A is an example logical diagram of a data management system forbusiness analytics management and modeling, in accordance with someembodiments;

FIG. 1B is a second example logical diagram of a data management systemfor business analytics management and modeling, in accordance with someembodiments;

FIG. 2 is an example logical diagram of an application server, inaccordance with some embodiments;

FIG. 3 is an example logical diagram of an automated modeler, inaccordance with some embodiments;

FIG. 4 is a flow chart diagram of an example high level process forbusiness analytics management, in accordance with some embodiments;

FIG. 5 is a flow chart diagram of manual data management, in accordancewith some embodiments;

FIG. 6 is a flow chart diagram of automated model creation, inaccordance with some embodiments;

FIG. 7A is a flow chart diagram of automated data scoring, in accordancewith some embodiments;

FIG. 7B is a flow chart diagram of model editing, in accordance withsome embodiments;

FIG. 8 is a flow chart diagram of report generation, in accordance withsome embodiments;

FIG. 9 is an example screenshot illustrating a business analyticsmanagement splash page, in accordance with some embodiments;

FIG. 10 is an example screenshot illustrating a business analyticsmanagement dashboard for a given project, in accordance with someembodiments;

FIG. 11 is an example screenshot illustrating a data managementdashboard, in accordance with some embodiments;

FIG. 12 is an example illustration of data buckets for the automatedmodeling engine, in accordance with some embodiments;

FIG. 13 is an example screenshot illustrating modeling dashboard, inaccordance with some embodiments;

FIG. 14 is an example screenshot illustrating an example model, inaccordance with some embodiments;

FIGS. 15-18 are example screenshots illustrating model analytics, inaccordance with some embodiments; and

FIGS. 19A and 19B illustrate exemplary computer systems capable ofimplementing embodiments of the data management and forecasting system.

DETAILED DESCRIPTION

The present invention will now be described in detail with reference toseveral embodiments thereof as illustrated in the accompanying drawings.In the following description, numerous specific details are set forth inorder to provide a thorough understanding of embodiments of the presentinvention. It will be apparent, however, to one skilled in the art, thatembodiments may be practiced without some or all of these specificdetails. In other instances, well known process steps and/or structureshave not been described in detail in order to not unnecessarily obscurethe present invention. The features and advantages of embodiments may bebetter understood with reference to the drawings and discussions thatfollow.

Aspects, features and advantages of exemplary embodiments of the presentinvention will become better understood with regard to the followingdescription in connection with the accompanying drawing(s). It should beapparent to those skilled in the art that the described embodiments ofthe present invention provided herein are illustrative only and notlimiting, having been presented by way of example only. All featuresdisclosed in this description may be replaced by alternative featuresserving the same or similar purpose, unless expressly stated otherwise.Therefore, numerous other embodiments of the modifications thereof arecontemplated as falling within the scope of the present invention asdefined herein and equivalents thereto. Hence, use of absolute and/orsequential terms, such as, for example, “will,” “will not,” “shall,”“shall not,” “must,” “must not,” “only,” “first,” “initially,” “next,”“subsequently,” “before,” “after,” “lastly,” and “finally,” are notmeant to limit the scope of the present invention as the embodimentsdisclosed herein are merely exemplary.

Note that significant portions of this disclosure will focus on themanagement and modeling of data for businesses. While this is intendedas a common use case, it should be understood that the presentlydisclosed systems and methods are useful for the modeling and managementof data based upon any time series data sets, for consumption by anykind of user. For example, the presently disclosed systems and methodscould be relied upon by a researcher to predict trends as easily as itis used by a business to forecast sales trends. As such, any time theterm ‘business’ is used in the context of this disclosure it should beunderstood that this may extend to any organization type: individual,investor group, business entity, governmental group, non-profit,religious affiliation, research institution, and the like. Further,references to business analytics, or business models should beunderstood to not be limited to commerce, but rather to any situationwhere such analysis may be needed or desired.

Lastly, note that the following description will be provided in a seriesof subsections for clarification purposes. These following subsectionsare not intended to artificially limit the scope of the disclosure, andas such any portion of one section should be understood to apply, ifdesired, to another section.

I. Data Management Systems for Modeling Business Analytics

The present invention relates to systems and methods for using availabledata and metrics to generate an entirely new data set throughtransformations to yield models. While various indices are alreadyknown, the presently disclosed systems and methods provide the abilityto generate highly accurate models that are forward looking rather thanproviding merely a snapshot of the current situation. Such systems andmethods allow for superior insight into current and near future healthand activity of a given industry sector, product, company or otherdimension of interest. This enables for better business planning,preparation, investment, and generally may assist in influencingbehaviors in more profitable ways.

To facilitate discussion, FIG. 1A is an example logical diagram of adata management system for business analytics modeling 100. The dataanalysis system 100 connects a given analyst user 105 through a network110 to the system application server 115. A database/data repository 120(or other suitable dataset based upon forecast sought) is linked to thesystem application server via connection 121 and the database 120 thusprovides access to the data necessary for utilization by the applicationserver 115.

The database 120 is populated with data delivered by and through thedata aggregation server 125 via connection 126. Data aggregation server125 is configured to have access to a number of data sources, forinstance external data sources 130 through connection 131. The dataaggregation server can also be configured to have access to proprietaryor internal data sources, e.g. customer data sources, 132, throughconnection 133. The aggregated data may be stored in a relationaldatabase (RDBM) or in big data-related storage facilities (e.g., Hadoop,NoSQL), with its formatting pre-processed to some degree (if desired) toconform to the data format requirement of the analysis component.

Network 110 provides access to the user or data analyst (the useranalyst). User analyst 105 will typically access the system through aninternet browser, such as Chrome or Mozilla Firefox, or a standaloneapplication, such as an app on tablet 151. As such the user analyst (asshown by arrow 135) may use an internet connected device such as browserterminal 150, whether a personal computer, mainframe computer, or VT100emulating terminal. Alternatively, mobile devices such as a tabletcomputer 151, smart telephone, or wirelessly connected laptop, whetheroperated over the internet or other digital telecommunications networks,such as a 4G network. In any implementation, a data connection 140 isestablished between the terminal (e.g. 150 or 151) through network 110to the application server 115 through connection 116.

Network 110 is depicted as a network cloud and as such is representativeof a wide variety of telecommunications networks, for instance the worldwide web, the internet, secure data networks, such as those provided byfinancial institutions or government entities such as the Department ofTreasury or Department of Commerce, internal networks such as localEthernet networks or intranets, direct connections by fiber opticnetworks, analog telephone networks, through satellite transmission, orthrough any combination thereof.

The database 120 serves as an online available database repository forcollected data including such data as internal metrics. Internal metricscan be comprised of, for instance, company financial data of a companyor other entity, or data derived from proprietary subscription sources.Economic, demographic, and statistical data that are collected fromvarious sources and stored in a relational database, hosted andmaintained by a data analytics provider and made accessible via theinternet. The data analytics provider may also arrange for a mirror ofthe datasets to be available at the company's local IT infrastructure orwithin a company intranet, which is periodically updated as required.

The application server 115 provides access to a system that provides aset of calculations based on system formula used to calculate theleading, lagging, coincident, procyclic, acyclic, and counter-cyclicnature of economic, demographic, or statistical data compared tointernal metrics, e.g., company financial results, or other externalmetrics. The system also provides for formula that may be used tocalculate a plurality of models based on projected or actual economic,demographic, and statistical data and company financial or sold volumeor quantity data. Details of the formulas and processes utilized for thecalculation of these models shall be provided in further detail below.These calculations can be displayed by the system in chart or othergraphical format.

In some embodiments, changes observed in a metric may also be classifiedaccording to its direction of change relative to the indicator that itis being measured against. When the metric changes in the same directionas the indicator, the relationship is said to be ‘procyclic’. When thechange is in the opposite direction as the indicator, the relationshipis said to be ‘countercyclic’. Because it is rare that any two metricswill be fully procyclic or countercyclic, it is also possible that ametric and an indicator can be acyclic—e.g., the metric exhibits bothprocyclic and countercyclic movement with respect to the indicator.

The application residing on server 115 is provided access to interactwith the customer datasource(s) 132 through the database 120 to performautomatic calculations which identify leading, lagging, and coincidentindicators as well as the procyclic, acyclic, and counter-cyclicrelationships between customer data and the available economic,demographic, and statistical data. Additionally, the models may beautomatically populated on a periodic schedule, e.g. every month. Users105 of the software applications that can be made available on theapplication server 115 are able to select and view charts or monitordashboard modules displaying the results of the calculations performedby the system. In some embodiments, user 105 can select data in thecustomer repository for use in the calculations that may allow the userto forecast future performance, or tune the business analytics models.The types of indicators and internal data are discussed in more detailin connection with the discourse accompanying the following figures.Alternatively, users can view external economic, demographic, andstatistical data only and do not have to interface with internalresults, at the option of the user. In yet other embodiments, allinternal and external data may be shielded from the user, and only themodels and analytics are provided to the user for ease of use.

Data is collected for external indicators and internal metrics of acompany through the data aggregation server 125. The formulas built intothe application identify relationships between the data. Users 105 canthen use the charting components to view the results of the calculationsand models. In some embodiments, the data can be entered into thedatabase 120 manually, as opposed to utilizing the data aggregationserver 125 and interface for calculation and forecasting. In someembodiments, the users 105 can enter and view any type of data and usethe applications to view charts and graphs of the data.

Alternatively, in some system users may have sensitive data thatrequires it to be maintained within the corporate environment. FIG. 1Bdepicts components of the system in an exemplary configuration toachieve enhanced data security and internal accessibility whilemaintaining the usefulness of the system and methods disclosed herein.For example, the data management system 101 may be configured in such amanner so that the application and aggregation server functionsdescribed in connection with FIG. 1A are provided by one or moreinternal application/aggregation servers 160. The internal server 160access external data sources 180 through metrics database 190, which mayhave its own aggregation implementation as well. The internal serveraccesses the metrics database 190 through the web or other such network110 via connections 162 and 192. The metrics database 190 acquires theappropriate data sets from one or more external sources, as at 180,through connection 182.

The one or more customer data sources 170 may be continue to be housedinternally and securely within the internal network. The internal server160 access the various internal sources 170 via connection 172, andimplements the same type of aggregation techniques described above. Theuser 105 of the system then accesses the application server 160 with atablet 151 or other browser software 150 via connections 135 and 140, asin FIG. 1A. External data sources 130 and 180 may be commercial datasubscriptions, public data sources, or data entered into an accessibleform manually.

FIG. 2 is an example logical diagram of an application server 160 thatincludes various subcomponents that act in concert to enable a number offunctions, including the generation of project dashboards and businessanalytics models. Generally the data being leveraged for the generationof models includes economic, demographic, geopolitical, public recordand statistical data. In some embodiments, the system utilizes any timeseries dataset. This time series data stored in the metrics database120, is available to all subsystems of the application server 160 formanipulation, transformation, aggregation, and analysis.

The subcomponents of the application server 160 are illustrated asunique modules within the server coupled by a common bus. While thisembodiment is useful for clarification purposes, it should be understoodthat the presently discussed application server may consist of logicalsubcomponents operating within a single or distributed computingarchitecture, may include individual and dedicated hardware for each ofthe enumerated subcomponents, may include hard coded firmware deviceswithin a server architecture, or any permutation of the embodimentsdescribed above. Further, it should be understood that the listedsubcomponents are not an exhaustive listing of the functionality of theapplication server 160, and as such more or fewer than the listedsubcomponents could exist in any given embodiment of the applicationserver when deployed.

The application server 160 includes a correlation engine 210 whichgenerates initial information regarding the metrics that are utilizedfor the generation of models. For example, the degree of procyclic orcounter-cyclic relationship that the indicator expresses may bedetermined by the correlation engine 210. Additionally, the correlationengine 210 may determine factors such as the number of major and minoroutliers for the given index, seasonality level of the indicator, degreeof data overlap between the indicator and other indicators, differenceof an indicator in terms of number of periods in the last update, thenumber of models already leveraging the indicator and frequency of theindicator, for example.

The application server 160 also includes a modeler 220. The modeler's220 functionality shall be discussed in considerable detail below;however, at its root it allows for the advanced compilation of manyindicators (including other published composite metrics and forecasts)and enables unique manipulation of these datasets in order to generatemodels from any time series datasets. Some of the manipulations enabledby the modeler 220 are the ability to visualize, on the fly, the R²,procyclic and countercyclic values for each indicator compared to themodel, and may also allow for the locking of any indicators time domain,and to shift other indicators and automatically update statisticalmeasures. Additionally, the modeler 220 may provide the user suggestionsof suitable indicators, and manipulations to indicators to ensure a‘best’ fit between prior actuals and the forecast over the same timeperiod. The ‘best’ fit may include a localized maxima optimization ofweighted statistical measures. For example, the R², procyclic andcountercyclic values could each be assigned a multiplier and the timedomain offset used for any given indicator could be optimized foraccordingly. The multipliers/weights could, in some embodiments, be userdefined.

FIG. 3 provides a greater detailed illustration of the modeler 220.Critical to the modeling is the determination of which indicators are tobe utilized in a given model. A ‘strength score’ may be calculated foreach indicator to assist in this determination. The strength scoregenerator and data ranker 310 consumes the indicator data, along withmetrics compiled by the correlation engine 210, to generate the strengthscore for a given indicator. The indicators are then ranked by theirstrength indicators.

As noted, an R-squared value for each indicator can be calculated.R-squared calculation is known, and involves the degree of variance inthe dependent variable (the modeled variable) that is predicted from theindependent variable (here the indicator). For example, the R-squaredvalue may be calculated by first computing the mean of observable data(y):

$\overset{\_}{y} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\; y_{i}}}$

The total sum of squares is calculated:

${SS}_{tot} = {\sum\limits_{i}\left( {y_{i} - \overset{\_}{y}} \right)^{2}}$

The regression sum of squares is next computed using predicted values(f):

${SS}_{reg} = {\sum\limits_{i}\left( {f_{i} - \overset{\_}{y}} \right)^{2}}$

The sum of the squares of the residuals is then calculated:

${SS}_{res} = {{\sum\limits_{i}\left( {y_{i} - f_{i}} \right)^{2}} = {\sum\limits_{i}e_{i}^{2}}}$

Lastly the coefficient of determination (R-squared) is calculated:

$R^{2} = {1 - \frac{{SS}_{res}}{{SS}_{tot}}}$

The R-squares value for any given indicator will vary based upon thevariable it is being used to model. In addition to the R-squared value,the degree of procyclic relationship between the indicator and the modelis received from the correlation engine 210. Also, as noted, the numberof minor and major outliers for the indicator are computed.Additionally, the correlation engine will note if an indicator exhibitsseasonalily. If it does, then that indicator receives a value of 0.95,whereas a non-seasonal indicator is assigned a value of 1.05. Dataoverlap between the primary indicator (aka dependent variable) and eachof the (explanatory) indicators is provided. The overlap is the numberof data points that is shared between the two.

The difference in the last updated date, as a number of periods, thenumber of models the indicator is used in, the total number of models,and the frequency of the indicator is also received. In someembodiments, the frequency is one of monthly, quarterly, semiannual, orannual. These frequencies are assigned values of 54, 18, 9 or 5,respectively.

In some embodiments, the strength score is determined by first addingtogether the R-squared and procyclic values. A frequency of use factoris calculated as the number of models the indicator is used in dividedby the total number of models, and the result added to one. A lastupdate factor is calculated as the last updated date minus two theresult divided by twenty, and the result then subtracted from one. Anoutlier factor is computed as one minus the number of minor outliersdivided by one hundred, the result then minus the number of majoroutliers divided by twenty. A minimum length factor is lastly calculatedas the data overlap divided by the frequency value.

The summed R-squared and procyclic value is multiplied the seasonalityfactor, the frequency of use factor, the last update factor, the outlierfactor, the minimum length factor and one hundred to generate thestrength score. For example, assume an indicator has the followingvalues:

R-squared 0.5593 Procyclic 0.6032 Minor Outlier count 3 Major Outliercount 1 Seasonality No (1.05) Difference in last updated date 2 Numberof models using the index 50 Total number of models 500 FrequencyMonthly (54)

This example indicator's strength score would then be calculated as thefollowing:

R-Squared + Procyclic 0.5593 + 0.6032 0.34 Seasonality Factor 1.05 1.05Frequency of use 1 + (50/500) 1.1 factor Last updated Factor$1 - \frac{\left( {2 - 2} \right)}{20}$ 1 Outliers Factor$1 - \frac{3}{100} - \frac{1}{20}$ 0.92 Minimum Length Factor$\frac{64}{54}$ 1.19 Strength Score 0.34 × 1.5 × 1.1 × 1 × 0.92 × 1.19 ×100 61

The indicators are ranked in order from the highest strength score tothe lowest strength score. They are then assigned to one of four‘buckets’ to be utilized in model generation by a data bucketizer 320.These buckets include macroeconomic datasets, datasets applicable to thespecific industry, datasets applicable to the demand industry, andmiscellaneous datasets. Besides the macroeconomic bucket, the datasetsin each bucket ideally have local (state level) data as well as countrylevel datasets. When local data is unavailable, country level data maybe substituted for. The datasets populating each of the buckets may beselected based upon their strength scores and their applicable tags. Forexample, if the demand industry bucket is for steel, only datasetstagged for steel would be used to populate the bucket.

A mix of country level and local/state level datasets are selected fromeach of the buckets based upon ranks by a data selector 330. From theselected datasets, various permutations of the data are each run throughone or more modeling algorithms by parallel modeling runtime servers340, to yield multiple model results. These models are then compared toone another, and the best model is selected for output and display bythe output selector 350. Models are compared by calculating a weightedaverage based on the length of segment within the model and theassociated R2. The model with the highest overall average is deemed thebest.

Returning to FIG. 2, the application server 160 also includes aworkbench manager 230 for the consolidated display of projects, datarelated to the projects and associated models. The workbench manager 230keeps track of user access controls, recent activity and data updates.This information may be compiled and displayed to the user in an easilyunderstood interface.

The application server 160 also includes an access controller (notillustrated) to protect various data from improper access. Even withinan organization, it may be desirable for various employees or agents tohave split access to various sensitive data sources, forecasts ormodels. Further, within a service or consulting organization, it is veryimportant to separate various clients' data, and role access controlenables this data from being improperly comingled. A projectsorganization engine 240 may include the access controller, or work inconcert with it, in order to provide a consolidated interface where theprojects associated with the user are displayed.

An analytics engine 250 may take the output from the modeler 220, andany additional data, and provide analytical analysis of the results. Forexample, models may be backtested, where the as-of dates may be alteredand the resulting forecast is compared against actuals. Contributioninformation, accuracy measures, and definitions are all provided by theanalytics engine 250. This component also allows the users to interactwith and edit any model they have access to, and add annotations to themodel results. All this information may be compiled in a configurablemanner into one or more reports.

Lastly, a publisher 260 allows for the reports generated by theanalytics engine 250, and/or specific models generated via the modeler220 to be published, with appropriate access controls, for visualizationand manipulation by the users or other intended audiences.

By automating an otherwise time-consuming and labor-intensive process,the above-described data management system for generating models offersmany advantages, including the generation of highly relevant andaccurate models that are forward looking, and the ability to directlycompare the models to historical actual data. In addition, theapplication server no longer requires user expertise. The result issubstantially reduced user effort needed for the generation of timelyand accurate business analytics reports.

Now that the systems for data management for generating models have beendescribed in considerable detail, attention will be turned towardsmethods of operation in the following subsection.

II. Data Management and Modeling Methods

To facilitate the discussion, a series of flowcharts are provided. FIG.4 is a flow chart diagram of an example high level process 400 for datamanagement and business analytics reporting. In this example process,the user of the system initially logs in (at 410) using a user name andpassword combination, biometric identifier, physical or software key, orother suitable method for accessing the system with a defined useraccount. The user account enables proper access control to datasets toensure that data is protected within an organization and betweenorganizations.

The user role access is confirmed (at 420) and the user is able tosearch and manipulate appropriate datasets. This allows the user toaccess project dashboards (at 430) for enhanced analysis of a givenproject. An example of a project splashpage 900 is provided in relationto FIG. 9, which a user may be routed to after login. In this splashpagethe user is identified/greeted. Only projects the user is authorized toaccess are provided on the left hand side of this example splashpage.Permissions for sharing projects is also enabled from this screen, aswell as the creation or deletion of projects. For the purposes of thisdiscussion a “project” requires permissions to access either view onlyor editing permission). Data and reports are project agnostic, andtherefore may be accessed without attendant permissions.

Moving to FIG. 10, a specific project the ‘California Raisins’ has beenselected from the splashpage and opens a project specific page 1000.Again access and permissions may be adjusted on this project page, aswell as the initiation of individual tasks such as model, workbench andanalytics generation (shown as the tradename ‘ERIN’). Additionallyindividual workbench data manipulations and modeling jobs may beaccessed on this screen. The number of each activity for the givenproject is also displayed, allowing the user to rapidly know the stateof the attendant project. For instance, in this example project thereare six ERIN jobs, seven workbenches and seven models. The most recentlyaccessed jobs are also displayed to the user to allow for rapid accessto where they may have last left off.

Returning to FIG. 4, after the projects dashboards have been accessedthe user may decide to manage the data manually or via automatedindicator scoring (at 440). FIG. 5 provides a flow diagram of manualdata management/selection process 450 in greater detail. Data is addedas a primary indicator to a given workbench (at 510), and mayadditionally be added as a primary indicator to a given model (at 520).Data is determined to be a primary indicator by user. Selection of theprimary indicator may employ the user searching for a specific datasetusing a keyword search and/or using predefined metadata tags. Thematching datasets are presented to the user for selection. The searchresults are ordered by best match to the keyword and/or tags as well asby alternate metrics, such as popularity of a given indicator (used inmany other forecast models), quality of indicator data, or frequency ofindicator data being updated. Search results may further be sorted andfiltered by certain characteristics of the data series, for instance, byregion, industry, category, attribute, or the like. In some cases,search display may depend upon a weighted algorithm of any combinationof the above factors. In addition to utilizing all or some of the abovefactors for displaying search results, some embodiments of the methodmay generate suggestions for indicators to the user independent of thesearch feature. Likewise, when a user selects an indicator, the systemmay be able to provide alternate recommendations of ‘better’ indicatorsbased on any of the above factors.

FIG. 11 provides an example screenshot of a dashboard 1100 for projectdata. This dashboard provides data that has been loaded into theproject, and any primary indicators associated with the project. Theuser is able to add data to the project, and assign any data as primaryto either a workbench and/or a model associated with the project. Forexample, in this example illustration, California weather events isshown as a data indicator that may be employed. The last modificationdate, number of data points in the indicator, source and associated tagsare each displayed.

Returning to FIG. 4, after data has been manually managed (or ifautomatic management is desired) the models may be managed (at 460)using the data that has been identified and organized. FIG. 6 providesgreater detail into this model management process. Initially, data isreceived for the model (at 605). The model horizon (length of model),demand industry, target industry and locale are all received (at 610),which correspond to the bucketized datasets already identified. Theoutput format requirements are also received (at 620). The datasets fromthe applicable buckets (already identified in the prior process) areselected (at 630), and the data may be automatically scored (at 635).FIG. 7A provides greater detail of this automated process for scoringindicators for automated selection. In this process, the datasets arenormalized (at 705). This may include transforming all the datasets intoa percent change over a given time period, an absolute dollar amountover a defined time period, or the like. Likewise, periods of time mayalso be normalized, such that the analysis window for all factors isequal.

The next step for the automated score generation is the computation of astrength score (at 715) as discussed previously in considerable detail.As noted before, strength scores are dependent upon R-squared, procycliccalculation and a number of additional factors such as numbers ofoutliers, numbers of models, frequency of updates, etc. These factorsare used to calculate metrics (e.g., seasonality factor, frequency ofuse factor, etc.) which are then multiplied together to generate a finalscore.

The various indicators are then bucketized into one of four buckets (at725). The historical length of the dataset that matches the geographiclocation that is being modeled for is divided by five and thenmultiplied by the number of buckets (four) to generate the number ofdatasets per bucket. The process helps to ensure that the model has anappropriate number of indicators and an even mix of statisticallysignificant economic, industry-specific and geography-specificindicators. For example, if there was available 40 months of datasetinformation this would be divided by 5 and multiplied by four togenerate 32, the number of datasets for this model. This results in 8datasets available for each bucket. The datasets in each bucket may thenbe ranked by their strength scores (at 735). FIG. 12 provides an exampleillustration 1200 of the four buckets. These include a macroeconomicbucket having 8 national datasets (for this specific example). Thesecond bucket is the target industry, which is divided into 4 nationaldatasets and 4 local (i.e., state level) datasets. When available,alternate definitions of locality may be employed, when sufficientlygranular indicators are available. For example, in some embodiments, thebuckets may include state and county level datasets, or even city leveldatasets. Likewise, the demand industry bucket and miscellaneous bucketinclude combinations of local and national datasets. When local data isnot available for any given dataset, it may be possible to substitute ahigher level data set in its place. For example if there is no statelevel data for a given dataset, but there is country level data andregional data (i.e., New England, Midwest, etc.) then applicable datamay be substituted from this larger geographic area for the missinglocal information.

Returning to FIG. 6, the number of total datasets is then permutatedinto combinations and run in parallel modelling processes (at 640). Thisparallel modeling results in a number of output model results, which arethen compared against past actuals to calculate the weighted predictiveR-squared values for each model (at 650). The model with the highestR-squared score is then selected (at 660) and used for viewing (670) andadditional analytics such as backtesting (at 680) and editing (at 690).Editing, as seen in greater detail in relation to FIG. 7B, may includeediting pre-adjustment factors (at 710), post adjustment factors (at720) indicator weights (at 730) and/or indicator time offsets (at 740).Pre-adjustment factors and post-adjustment factors are multipliers tothe forecast and/or indicators that account for some anomaly in thedata. For example, a major snowstorm impacting the eastern seaboard mayhave an exaggerated impact upon heating costs in the region. If theforecast is for global demand for heating oil, this unusual event mayskew the final forecast. An adjustment factor may be leveraged in orderto correct for such events. The weight may be any positive or negativenumber, and is a multiplier against the indicator to vary the influenceof the indicator in the final model. A negative weight will reverseprocyclic and countercyclic indicators. Determining whether an indicatorrelationship exists between two data series, as well as the nature andcharacteristics of such a relationship, if found, can be a very valuabletool. Armed with the knowledge, for example, that certain macroeconomicmetrics are predictors of future internal metrics, business leaders canadjust internal processes and goals to increase productivity,profitability, and predictability. The time offset allows the user tomove the time domain of any indicator relevant to the forecast. Forexample, in the above example of global heating oil, the globaltemperature may have a thirty day lag in reflecting in heating oilprices. In contrast, refining capacity versus crude supply may be aleading indicator of the heating oil prices. These two exampleindicators would be given different time offsets in order to refine theforecast.

For any forecast indicator, an R² value, procyclic value andcountercyclic value is generated in real time for any given weight andtime offset. These statistical measures enable the user to tailor theirmodel according to their concerns. In some embodiments the weights andoffsets for the indicators may be auto-populated by the method withsuggested values.

Modeling formulas may be configured using machine learning, or expertinput. These models may be leveraged by a typical user without anyadditional interaction. However for a more advanced user, it may bedesirable to allow editing of model formulas. In some embodiments, theformula is freeform, allowing the user to tailor the formula howeverdesired. In alternate embodiments, the formula configuration includes aset of discrete transformations, including providing each indicator witha weight, and allowing the indicators to be added/subtracted and/ormultiplied or divided against any other single or group of indicators.

FIG. 13 provides an example screenshot of a model dashboard 1300 wherethe models associated with a project are provided to the user. Here theuser is able to edit, delete, copy or create additional models. Modelsmay also be opened for viewing, tagged for searching, moved to otherprojects and associated with specific reports.

When a model is opened, a dashboard 1400 is presented to the user forthe specific model, as seen in the example illustration of FIG. 14. Themost relevant information to the model is provided to the user in thisexample, and specific data may be toggled as being visible or not. Forinstance, in this example screenshot, the raw forecast is set as beingvisible, which is presented as a graph with a solid line for pastforecast, and a dotted line for future forecast. Information, such aswhat is being forecasted, target datasets and key parameters, are alsodisplayed. Significant amounts of additional data may be selected fordisplay in the model, but is not present in this example, such asaccuracy measures, risk, relative importance of indicators, etc. In someembodiments, any individual with project permissions may access and viewthe model.

In FIG. 15, another example screenshot of the model dashboard 1500 isseen. In this example, the only difference us that accuracy data hasbeen selected as being visible. In the top graph, the as-of date hasbeen shifted one year back, and the actual line (lighter dotted line) isshown in comparison to the forecasted line (darker dotted line). Note,the initial values are very close to one another, but the further intothe future, the larger the discrepancy between the actual values andforecasted values become. Below the plot, the single period error isprovided in histogram format. Likewise, aggregate error is provided in aseparate histogram.

Returning to FIG. 4, after model management, reports may be generated(at 470) using the various models and these models may be published (at480). Report generation includes compiling desired model metricstogether. Report generation may additionally include analysis of themodels. For example, FIG. 8 provides a more detailed flowchart for theprocess of analyzing a model forecast. Initially the primary indicatoris charted overlying each explanatory indicator (at 810). This chartingallows a user to rapidly ascertain, using visual cues, the relationshipbetween the primary indicator and each given metric. Humans are veryvisual, and being able to graphically identify trends is often mucheasier than using numerical data sets. In addition to the graphs, theR2, procyclic values, and countercyclic values may be presented (at 820)alongside the charted indicators.

Where the current method is particularly potent is its ability torapidly shift the time domains, on the fly, of any of the indicators todetermine the impact this has on the forecast. In some embodiments, oneor more time domain adjusters may be utilized to alter the time domainof indicators, and alter and redefine the time domain in which theselected metrics for a report are displayed. Additionally, the timedomain of any given indicator may be locked (at 830) such that if anindicator is locked (at 840) any changes to the time domain will onlyshift for non-locked indicators. Upon a shift in the time domain, thecharts that are locked are kept static (at 850) as the other graphs areupdated.

In addition to presenting the graphs comparing indicators to theforecast, in some embodiments, the forecast may be displayed versusactual values (for the past time period), trends for the forecast arelikewise displayed, as well as the future forecast values (at 860).Forecast horizon, mean absolute percent error, and additionalstatistical accuracy measures for the forecast may also be provided (at870). Lastly, the eventual purpose of the generation of the forecast isto modify user or organization behaviors (at 880).

In some embodiments, modifying behaviors may be dependent upon the userto formulate and implement. In advanced embodiments, suggested behaviorsbased upon the outlook scores (such as commodity hedging, investmenttrends, or securing longer or shorter term contracts) may beautomatically suggested to the user for implementation. In theseembodiments, the system utilizes rules regarding the user, ororganization, related to objectives or business goals. Theserules/objectives are cross referenced against the outlook scores, andadvanced machine learning algorithms may be employed in order togenerate the resulting behavior modification suggestions. In some otherembodiments, the user may configure state machines in order to leverageoutlook scores to generate these behavior modification suggestions.Lastly, in even further advanced embodiments, in addition to thegeneration of these suggestions, the system may be further capable ofacting upon the suggestions autonomously. In some of these embodiments,the user may configure a set of rules under which the system is capableof autonomous activity. For example, the outlook score may be requiredto have above a specific accuracy threshold, and the action may belimited to a specific dollar amount for example.

Turning to FIG. 16, an example screenshot 1600 of an edited model forinclusion in a report is provided. The left sidebar allows the viewer toedit and interact with the report. “Snapshots” of specific information,such as accuracy reports, may be saved by the user and included in thereport. FIG. 17 is an example screenshot 1700 of such a ‘snapshot’.Snapshots also allow for on-the-fly alternative testing of a modelwithout the need to copy the entire model. Model editing is restrictedto individuals with appropriate access, but allows for the production ofreport features that can be disseminated more broadly.

FIG. 18 provides an example screenshot of a report 1800, which is fullydefined by the user. Components from model view screens may beincorporated into the reports. Templates for reports may be accessed bythe user to help assist in the generation of aesthetically pleasing andinformative reports.

III. System Embodiments

Now that the systems and methods for the generation of models andmanagement of these models and data have been described, attention shallnow be focused upon systems capable of executing the above functions. Tofacilitate this discussion, FIGS. 19A and 19B illustrate a ComputerSystem 1900, which is suitable for implementing embodiments of thepresent invention. FIG. 19A shows one possible physical form of theComputer System 1900. Of course, the Computer System 1900 may have manyphysical forms ranging from a printed circuit board, an integratedcircuit, and a small handheld device up to a huge super computer.Computer system 1900 may include a Monitor 1902, a Display 1904, aHousing 1906, a Disk Drive 1908, a Keyboard 1910, and a Mouse 1912. Disk1914 is a computer-readable medium used to transfer data to and fromComputer System 1900.

FIG. 19B is an example of a block diagram for Computer System 1900.Attached to System Bus 1920 are a wide variety of subsystems.Processor(s) 1922 (also referred to as central processing units, orCPUs) are coupled to storage devices, including Memory 1924. Memory 1924includes random access memory (RAM) and read-only memory (ROM). As iswell known in the art, ROM acts to transfer data and instructionsuni-directionally to the CPU and RAM is used typically to transfer dataand instructions in a bi-directional manner. Both of these types ofmemories may include any suitable of the computer-readable mediadescribed below. A Fixed medium 1926 may also be coupledbi-directionally to the Processor 1922; it provides additional datastorage capacity and may also include any of the computer-readable mediadescribed below. Fixed medium 1926 may be used to store programs, data,and the like and is typically a secondary storage medium (such as a harddisk) that is slower than primary storage. It will be appreciated thatthe information retained within Fixed medium 1926 may, in appropriatecases, be incorporated in standard fashion as virtual memory in Memory1924. Removable Disk 1914 may take the form of any of thecomputer-readable media described below.

Processor 1922 is also coupled to a variety of input/output devices,such as Display 1904, Keyboard 1910, Mouse 1912 and Speakers 1930. Ingeneral, an input/output device may be any of: video displays, trackballs, mice, keyboards, microphones, touch-sensitive displays,transducer card readers, magnetic or paper tape readers, tablets,styluses, voice or handwriting recognizers, biometrics readers, motionsensors, brain wave readers, or other computers. Processor 1922optionally may be coupled to another computer or telecommunicationsnetwork using Network Interface 1940. With such a Network Interface1940, it is contemplated that the Processor 1922 might receiveinformation from the network, or might output information to the networkin the course of performing the above-described generation of models.Furthermore, method embodiments of the present invention may executesolely upon Processor 1922 or may execute over a network such as theInternet in conjunction with a remote CPU that shares a portion of theprocessing.

Software is typically stored in the non-volatile memory and/or the driveunit. Indeed, for large programs, it may not even be possible to storethe entire program in the memory. Nevertheless, it should be understoodthat for software to run, if necessary, it is moved to a computerreadable location appropriate for processing, and for illustrativepurposes, that location is referred to as the memory in this disclosure.Even when software is moved to the memory for execution, the processorwill typically make use of hardware registers to store values associatedwith the software, and local cache that, ideally, serves to speed upexecution. As used herein, a software program is assumed to be stored atany known or convenient location (from non-volatile storage to hardwareregisters) when the software program is referred to as “implemented in acomputer-readable medium.” A processor is considered to be “configuredto execute a program” when at least one value associated with theprogram is stored in a register readable by the processor.

In operation, the computer system 1900 can be controlled by operatingsystem software that includes a file management system, such as a diskoperating system. One example of operating system software withassociated file management system software is the family of operatingsystems known as Windows® from Microsoft Corporation of Redmond, Wash.,and their associated file management systems. Another example ofoperating system software with its associated file management systemsoftware is the Linux operating system and its associated filemanagement system. The file management system is typically stored in thenon-volatile memory and/or drive unit and causes the processor toexecute the various acts required by the operating system to input andoutput data and to store data in the memory, including storing files onthe non-volatile memory and/or drive unit.

Some portions of the detailed description may be presented in terms ofalgorithms and symbolic representations of operations on data bitswithin a computer memory. These algorithmic descriptions andrepresentations are the means used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. An algorithm is, here and generally,conceived to be a self-consistent sequence of operations leading to adesired result. The operations are those requiring physicalmanipulations of physical quantities. Usually, though not necessarily,these quantities take the form of electrical or magnetic signals capableof being stored, transferred, combined, compared, and otherwisemanipulated. It has proven convenient at times, principally for reasonsof common usage, to refer to these signals as bits, values, elements,symbols, characters, terms, numbers, or the like.

The algorithms and displays presented herein are not inherently relatedto any particular computer or other apparatus. Various general purposesystems may be used with programs in accordance with the teachingsherein, or it may prove convenient to construct more specializedapparatus to perform the methods of some embodiments. The requiredstructure for a variety of these systems will appear from thedescription below. In addition, the techniques are not described withreference to any particular programming language, and variousembodiments may, thus, be implemented using a variety of programminglanguages.

In alternative embodiments, the machine operates as a standalone deviceor may be connected (e.g., networked) to other machines. In a networkeddeployment, the machine may operate in the capacity of a server or aclient machine in a client-server network environment or as a peermachine in a peer-to-peer (or distributed) network environment.

The machine may be a server computer, a client computer, a personalcomputer (PC), a tablet PC, a laptop computer, a set-top box (STB), apersonal digital assistant (PDA), a cellular telephone, an iPhone, aBlackberry, a processor, a telephone, a web appliance, a network router,switch or bridge, or any machine capable of executing a set ofinstructions (sequential or otherwise) that specify actions to be takenby that machine.

While the machine-readable medium or machine-readable storage medium isshown in an exemplary embodiment to be a single medium, the term“machine-readable medium” and “machine-readable storage medium” shouldbe taken to include a single medium or multiple media (e.g., acentralized or distributed database, and/or associated caches andservers) that store the one or more sets of instructions. The term“machine-readable medium” and “machine-readable storage medium” shallalso be taken to include any medium that is capable of storing, encodingor carrying a set of instructions for execution by the machine and thatcause the machine to perform any one or more of the methodologies of thepresently disclosed technique and innovation.

In general, the routines executed to implement the embodiments of thedisclosure may be implemented as part of an operating system or aspecific application, component, program, object, module or sequence ofinstructions referred to as “computer programs.” The computer programstypically comprise one or more instructions set at various times invarious memory and storage devices in a computer, and when read andexecuted by one or more processing units or processors in a computer,cause the computer to perform operations to execute elements involvingthe various aspects of the disclosure.

Moreover, while embodiments have been described in the context of fullyfunctioning computers and computer systems, those skilled in the artwill appreciate that the various embodiments are capable of beingdistributed as a program product in a variety of forms, and that thedisclosure applies equally regardless of the particular type of machineor computer-readable media used to actually effect the distribution

While this invention has been described in terms of several embodiments,there are alterations, modifications, permutations, and substituteequivalents, which fall within the scope of this invention. Althoughsub-section titles have been provided to aid in the description of theinvention, these titles are merely illustrative and are not intended tolimit the scope of the present invention. It should also be noted thatthere are many alternative ways of implementing the methods andapparatuses of the present invention. It is therefore intended that thefollowing appended claims be interpreted as including all suchalterations, modifications, permutations, and substitute equivalents asfall within the true spirit and scope of the present invention.

What is claimed is:
 1. A computerized method for generating a model,useful in association with a business analytics system, the methodcomprising: selecting a plurality of indicators relevant to a model;generating a strength score for each of the plurality of indicatorsusing a processor that calculates a R-squared value and procyclic valuesfor each indicator to derive the strength scores by adding the R-squaredvalue and procyclic values and multiplying the results by a seasonalitydetermination, a normalized fraction of a number of model indicatorsover a total number of models, a normalized difference in a last update,a normalized difference between a minor outlier count and a majoroutlier count, and a fraction of data overlap over a frequency value ofthe indicator; ranking the plurality of indicators by their strengthscore; bucketizing the plurality of ranked indicators; configuring aplurality of models using machine learning; modeling variouspermutations of the bucketized indicators in parallel to generate aplurality of model forecasts; selecting one of the plurality of modelforecasts with the highest accuracy; and generating a report using theselected one model forecast.
 2. The method of claim 1, wherein thestrength score is calculated by:(R-squared calculation+procyclic calculation)×(the seasonalitydetermination)×((the number of models the indicator is included/thetotal number of models)+1)×(1−((Difference in the lastupdated−2)/20)))×(1−(minor outlier count/100)−(major outliercount/20))×(data overlap/frequency value of the indicator)×100, whereinthe seasonality determination is 0.95 for a seasonality indicator and1.05 for a non-seasonality indicator, and wherein the frequency value ofthe indicator is 54 for monthly, 18 for quarterly, 9 for semiannually,and 5 for annually.
 3. The method of claim 1, wherein the bucketizingincludes dividing the total number of data points available for themodel by five and multiplying by four buckets to generate a total numberof data sets.
 4. The method of claim 3, wherein the total number of datasets are divided evenly into a macroeconomic bucket, a target industrybucket, a demand industry bucket and a miscellaneous bucket.
 5. Themethod of claim 4, wherein the data sets in the macroeconomic bucket arenational level, and the datasets in the target industry bucket, thedemand industry bucket and the miscellaneous bucket are equal partsnational level and local level datasets, when appropriate.
 6. The methodof claim 5, wherein higher level datasets are substituted when locallevel datasets are unavailable.
 7. The method of claim 4, wherein thedatasets are divided into the four buckets by tag information and thestrength scores.
 8. The method of claim 1, wherein the report generatingincludes backtesting the model.
 9. The method of claim 1, wherein reportgenerating includes calculating a single period error rate and anaggregate period error for the model.
 10. The system of claim 1, whereinreport generating includes calculating a single period error rate and anaggregate period error for the model.
 11. A computerized system forgenerating a model comprising: a data aggregation server for selecting aplurality of indicators relevant to a model; a data analyzer forgenerating a strength score for each of the plurality of indicatorsusing a processor that calculates a R-squared value and procyclic valuesfor each indicator to derive the strength scores by adding the R-squaredvalue and procyclic values and multiplying the results by a seasonalitydetermination, a normalized fraction of a number of model indicatorsover a total number of models, a normalized difference in a last update,a normalized difference between a minor outlier count and a majoroutlier count, and a fraction of data overlap over a frequency value ofthe indicator; a modeling engine for configuring a plurality of modelsusing machine learning, modeling various permutations of the bucketizedindicators in parallel to generate a plurality of model forecasts, andselecting one of the plurality of model forecasts with the highestaccuracy; and a reporting module for generating a report using theselected one model forecast.
 12. The system of claim 11, wherein thestrength score is calculated by:(R-squared calculation+procyclic calculation)×(the seasonalitydetermination)×((the number of models the indicator is included/thetotal number of models)+1)×(1−((difference in lastupdated−2)/20))×(1−(minor outlier count/100)−(major outliercount/20))×(data overlap/frequency value of the indicator)×100, whereinthe seasonality determination is 0.95 for a seasonality indicator and1.05 for a non-seasonality indicator, and wherein the frequency value ofthe indicator is 54 for monthly, 18 for quarterly, 9 for semiannually,and 5 for annually.
 13. The system of claim 11, wherein the bucketizingincludes dividing the total number of data points available for themodel by five and multiplying by four buckets to generate a total numberof data sets.
 14. The system of claim 13, wherein the total number ofdata sets are divided evenly into a macroeconomic bucket, a targetindustry bucket, a demand industry bucket and a miscellaneous bucket.15. The system of claim 14, wherein the data sets in the macroeconomicbucket are national level, and the datasets in the target industrybucket, the demand industry bucket and the miscellaneous bucket areequal parts national level and local level datasets, when appropriate.16. The system of claim 15, wherein higher level datasets aresubstituted when local level datasets are unavailable.
 17. The system ofclaim 14, wherein the datasets are divided into the four buckets by taginformation and the strength scores.
 18. The system of claim 11, whereinthe report generating includes backtesting the model.